Overview
Brought to you by YData
Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 449 | 428 |
| Missing cells (%) | 8.4% | 8.0% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Sex is highly overall correlated with Survived | Alert not present in this dataset | High correlation |
Survived is highly overall correlated with Sex | Alert not present in this dataset | High correlation |
Age has 98 (22.0%) missing values | Age has 89 (20.0%) missing values | Missing |
Cabin has 349 (78.3%) missing values | Cabin has 337 (75.6%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 303 (67.9%) zeros | SibSp has 309 (69.3%) zeros | Zeros |
Parch has 332 (74.4%) zeros | Parch has 341 (76.5%) zeros | Zeros |
Fare has 9 (2.0%) zeros | Fare has 7 (1.6%) zeros | Zeros |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2025-01-20 16:51:20.284363 | 2025-01-20 16:51:22.706938 |
| Analysis finished | 2025-01-20 16:51:22.703694 | 2025-01-20 16:51:25.110202 |
| Duration | 2.42 seconds | 2.4 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
Variables
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 435.42377 | 447.66592 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 7 |
| Maximum | 889 | 891 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 7 |
| 5-th percentile | 36.5 | 51.5 |
| Q1 | 216.25 | 234.25 |
| median | 434 | 437 |
| Q3 | 653.75 | 669.75 |
| 95-th percentile | 842 | 840.75 |
| Maximum | 889 | 891 |
| Range | 888 | 884 |
| Interquartile range (IQR) | 437.5 | 435.5 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 256.90145 | 254.94299 |
| Coefficient of variation (CV) | 0.59000329 | 0.56949385 |
| Kurtosis | -1.1630314 | -1.2096078 |
| Mean | 435.42377 | 447.66592 |
| Median Absolute Deviation (MAD) | 219.5 | 218 |
| Skewness | 0.050012232 | 0.0090972841 |
| Sum | 194199 | 199659 |
| Variance | 65998.357 | 64995.926 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 686 | 1 | 0.2% |
| 806 | 1 | 0.2% |
| 414 | 1 | 0.2% |
| 887 | 1 | 0.2% |
| 79 | 1 | 0.2% |
| 793 | 1 | 0.2% |
| 308 | 1 | 0.2% |
| 260 | 1 | 0.2% |
| 202 | 1 | 0.2% |
| 525 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 55 | 1 | 0.2% |
| 366 | 1 | 0.2% |
| 290 | 1 | 0.2% |
| 57 | 1 | 0.2% |
| 156 | 1 | 0.2% |
| 42 | 1 | 0.2% |
| 351 | 1 | 0.2% |
| 682 | 1 | 0.2% |
| 875 | 1 | 0.2% |
| 795 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 10 | 1 | |
| 14 | 1 | |
| 15 | 1 | |
| 16 | 1 |
| Value | Count | Frequency (%) |
| 7 | 1 | |
| 8 | 1 | |
| 12 | 1 | |
| 15 | 1 | |
| 16 | 1 | |
| 18 | 1 | |
| 21 | 1 | |
| 22 | 1 | |
| 25 | 1 | |
| 27 | 1 |
| Value | Count | Frequency (%) |
| 7 | 1 | |
| 8 | 1 | |
| 12 | 1 | |
| 15 | 1 | |
| 16 | 1 | |
| 18 | 1 | |
| 21 | 1 | |
| 22 | 1 | |
| 25 | 1 | |
| 27 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 10 | 1 | |
| 14 | 1 | |
| 15 | 1 | |
| 16 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 0 | 0 |
| 2nd row | 0 | 1 |
| 3rd row | 0 | 1 |
| 4th row | 1 | 0 |
| 5th row | 0 | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 274 | |
| 1 | 172 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 3 | 3 |
| 2nd row | 2 | 3 |
| 3rd row | 2 | 2 |
| 4th row | 2 | 1 |
| 5th row | 3 | 2 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 102 | |
| 2 | 93 | 20.9% |
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 109 | |
| 2 | 89 | 20.0% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 102 | |
| 2 | 93 | 20.9% |
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 109 | |
| 2 | 89 | 20.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 102 | |
| 2 | 93 | 20.9% |
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 109 | |
| 2 | 89 | 20.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 102 | |
| 2 | 93 | 20.9% |
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 109 | |
| 2 | 89 | 20.0% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 102 | |
| 2 | 93 | 20.9% |
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 109 | |
| 2 | 89 | 20.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 102 | |
| 2 | 93 | 20.9% |
| Value | Count | Frequency (%) |
| 3 | 248 | |
| 1 | 109 | |
| 2 | 89 | 20.0% |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 82 |
| Median length | 53 | 50 |
| Mean length | 27.464126 | 26.524664 |
| Min length | 12 | 12 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Johansson, Mr. Karl Johan | Adahl, Mr. Mauritz Nils Martin |
| 2nd row | Cunningham, Mr. Alfred Fleming | Connolly, Miss. Kate |
| 3rd row | Montvila, Rev. Juozas | Rugg, Miss. Emily |
| 4th row | Caldwell, Master. Alden Gates | Williams, Mr. Charles Duane |
| 5th row | Sage, Miss. Stella Anna | Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott) |
| Value | Count | Frequency (%) |
| mr | 254 | 13.8% |
| miss | 89 | 4.8% |
| mrs | 70 | 3.8% |
| william | 30 | 1.6% |
| master | 22 | 1.2% |
| john | 22 | 1.2% |
| henry | 19 | 1.0% |
| charles | 15 | 0.8% |
| george | 13 | 0.7% |
| james | 13 | 0.7% |
| Other values (887) | 1292 |
| Value | Count | Frequency (%) |
| mr | 261 | 14.5% |
| miss | 100 | 5.6% |
| mrs | 57 | 3.2% |
| william | 27 | 1.5% |
| john | 24 | 1.3% |
| master | 18 | 1.0% |
| henry | 16 | 0.9% |
| thomas | 14 | 0.8% |
| mary | 12 | 0.7% |
| charles | 11 | 0.6% |
| Other values (896) | 1254 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1395 | 11.4% | |
| r | 1032 | 8.4% |
| e | 860 | 7.0% |
| a | 852 | 7.0% |
| s | 680 | 5.6% |
| i | 669 | 5.5% |
| n | 650 | 5.3% |
| M | 564 | 4.6% |
| l | 530 | 4.3% |
| o | 513 | 4.2% |
| Other values (49) | 4504 |
| Value | Count | Frequency (%) |
| 1350 | 11.4% | |
| r | 932 | 7.9% |
| a | 837 | 7.1% |
| e | 798 | 6.7% |
| s | 665 | 5.6% |
| n | 656 | 5.5% |
| i | 641 | 5.4% |
| M | 563 | 4.8% |
| l | 517 | 4.4% |
| o | 506 | 4.3% |
| Other values (49) | 4365 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 12249 |
| Value | Count | Frequency (%) |
| (unknown) | 11830 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1395 | 11.4% | |
| r | 1032 | 8.4% |
| e | 860 | 7.0% |
| a | 852 | 7.0% |
| s | 680 | 5.6% |
| i | 669 | 5.5% |
| n | 650 | 5.3% |
| M | 564 | 4.6% |
| l | 530 | 4.3% |
| o | 513 | 4.2% |
| Other values (49) | 4504 |
| Value | Count | Frequency (%) |
| 1350 | 11.4% | |
| r | 932 | 7.9% |
| a | 837 | 7.1% |
| e | 798 | 6.7% |
| s | 665 | 5.6% |
| n | 656 | 5.5% |
| i | 641 | 5.4% |
| M | 563 | 4.8% |
| l | 517 | 4.4% |
| o | 506 | 4.3% |
| Other values (49) | 4365 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 12249 |
| Value | Count | Frequency (%) |
| (unknown) | 11830 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1395 | 11.4% | |
| r | 1032 | 8.4% |
| e | 860 | 7.0% |
| a | 852 | 7.0% |
| s | 680 | 5.6% |
| i | 669 | 5.5% |
| n | 650 | 5.3% |
| M | 564 | 4.6% |
| l | 530 | 4.3% |
| o | 513 | 4.2% |
| Other values (49) | 4504 |
| Value | Count | Frequency (%) |
| 1350 | 11.4% | |
| r | 932 | 7.9% |
| a | 837 | 7.1% |
| e | 798 | 6.7% |
| s | 665 | 5.6% |
| n | 656 | 5.5% |
| i | 641 | 5.4% |
| M | 563 | 4.8% |
| l | 517 | 4.4% |
| o | 506 | 4.3% |
| Other values (49) | 4365 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 12249 |
| Value | Count | Frequency (%) |
| (unknown) | 11830 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1395 | 11.4% | |
| r | 1032 | 8.4% |
| e | 860 | 7.0% |
| a | 852 | 7.0% |
| s | 680 | 5.6% |
| i | 669 | 5.5% |
| n | 650 | 5.3% |
| M | 564 | 4.6% |
| l | 530 | 4.3% |
| o | 513 | 4.2% |
| Other values (49) | 4504 |
| Value | Count | Frequency (%) |
| 1350 | 11.4% | |
| r | 932 | 7.9% |
| a | 837 | 7.1% |
| e | 798 | 6.7% |
| s | 665 | 5.6% |
| n | 656 | 5.5% |
| i | 641 | 5.4% |
| M | 563 | 4.8% |
| l | 517 | 4.4% |
| o | 506 | 4.3% |
| Other values (49) | 4365 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.7219731 | 4.7085202 |
| Min length | 4 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | male | male |
| 2nd row | male | female |
| 3rd row | male | female |
| 4th row | male | male |
| 5th row | female | female |
Common Values
| Value | Count | Frequency (%) |
| male | 285 | |
| female | 161 |
| Value | Count | Frequency (%) |
| male | 288 | |
| female | 158 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 285 | |
| female | 161 |
| Value | Count | Frequency (%) |
| male | 288 | |
| female | 158 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2106 |
| Value | Count | Frequency (%) |
| (unknown) | 2100 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2106 |
| Value | Count | Frequency (%) |
| (unknown) | 2100 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2106 |
| Value | Count | Frequency (%) |
| (unknown) | 2100 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 76 | 76 |
| Distinct (%) | 21.8% | 21.3% |
| Missing | 98 | 89 |
| Missing (%) | 22.0% | 20.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.328534 | 29.771709 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.42 |
| Maximum | 74 | 74 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.42 |
| 5-th percentile | 3.35 | 4 |
| Q1 | 19 | 20 |
| median | 28 | 28 |
| Q3 | 38 | 39 |
| 95-th percentile | 57 | 56 |
| Maximum | 74 | 74 |
| Range | 73.58 | 73.58 |
| Interquartile range (IQR) | 19 | 19 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.99007 | 14.415345 |
| Coefficient of variation (CV) | 0.51110874 | 0.48419609 |
| Kurtosis | 0.03506182 | -0.081335331 |
| Mean | 29.328534 | 29.771709 |
| Median Absolute Deviation (MAD) | 9 | 9 |
| Skewness | 0.3372763 | 0.322972 |
| Sum | 10206.33 | 10628.5 |
| Variance | 224.70221 | 207.80217 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 25 | 16 | 3.6% |
| 21 | 15 | 3.4% |
| 30 | 15 | 3.4% |
| 18 | 15 | 3.4% |
| 32 | 14 | 3.1% |
| 19 | 13 | 2.9% |
| 36 | 12 | 2.7% |
| 22 | 12 | 2.7% |
| 28 | 11 | 2.5% |
| 35 | 11 | 2.5% |
| Other values (66) | 214 | |
| (Missing) | 98 |
| Value | Count | Frequency (%) |
| 18 | 18 | 4.0% |
| 24 | 16 | 3.6% |
| 22 | 15 | 3.4% |
| 25 | 13 | 2.9% |
| 21 | 12 | 2.7% |
| 19 | 11 | 2.5% |
| 28 | 11 | 2.5% |
| 30 | 11 | 2.5% |
| 27 | 10 | 2.2% |
| 20 | 9 | 2.0% |
| Other values (66) | 231 | |
| (Missing) | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 2 | 0.4% |
| 1 | 4 | |
| 2 | 6 | |
| 3 | 4 | |
| 4 | 8 | |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| 7 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 5 | |
| 3 | 3 | |
| 4 | 5 | |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| 7 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 5 | |
| 3 | 3 | |
| 4 | 5 | |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| 7 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 2 | 0.4% |
| 1 | 4 | |
| 2 | 6 | |
| 3 | 4 | |
| 4 | 8 | |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| 7 | 2 | 0.4% |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.51569507 | 0.49775785 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 303 | 309 |
| Zeros (%) | 67.9% | 69.3% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 2 | 2 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.1029991 | 1.0928756 |
| Coefficient of variation (CV) | 2.1388592 | 2.1955968 |
| Kurtosis | 20.129047 | 20.844338 |
| Mean | 0.51569507 | 0.49775785 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.9316316 | 4.0005197 |
| Sum | 230 | 222 |
| Variance | 1.216607 | 1.194377 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 303 | |
| 1 | 109 | 24.4% |
| 2 | 13 | 2.9% |
| 3 | 8 | 1.8% |
| 4 | 6 | 1.3% |
| 8 | 4 | 0.9% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 309 | |
| 1 | 104 | 23.3% |
| 2 | 12 | 2.7% |
| 3 | 8 | 1.8% |
| 4 | 7 | 1.6% |
| 8 | 4 | 0.9% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 303 | |
| 1 | 109 | 24.4% |
| 2 | 13 | 2.9% |
| 3 | 8 | 1.8% |
| 4 | 6 | 1.3% |
| 5 | 3 | 0.7% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 309 | |
| 1 | 104 | 23.3% |
| 2 | 12 | 2.7% |
| 3 | 8 | 1.8% |
| 4 | 7 | 1.6% |
| 5 | 2 | 0.4% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 309 | |
| 1 | 104 | 23.3% |
| 2 | 12 | 2.7% |
| 3 | 8 | 1.8% |
| 4 | 7 | 1.6% |
| 5 | 2 | 0.4% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 303 | |
| 1 | 109 | 24.4% |
| 2 | 13 | 2.9% |
| 3 | 8 | 1.8% |
| 4 | 6 | 1.3% |
| 5 | 3 | 0.7% |
| 8 | 4 | 0.9% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.42152466 | 0.35201794 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 6 | 6 |
| Zeros | 332 | 341 |
| Zeros (%) | 74.4% | 76.5% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 0 |
| 95-th percentile | 2 | 2 |
| Maximum | 6 | 6 |
| Range | 6 | 6 |
| Interquartile range (IQR) | 1 | 0 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.86473138 | 0.75212535 |
| Coefficient of variation (CV) | 2.0514372 | 2.1366109 |
| Kurtosis | 9.7118903 | 12.371668 |
| Mean | 0.42152466 | 0.35201794 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.7362506 | 2.9482384 |
| Sum | 188 | 157 |
| Variance | 0.74776037 | 0.56569255 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 332 | |
| 1 | 60 | 13.5% |
| 2 | 46 | 10.3% |
| 5 | 3 | 0.7% |
| 4 | 3 | 0.7% |
| 6 | 1 | 0.2% |
| 3 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 341 | |
| 1 | 65 | 14.6% |
| 2 | 35 | 7.8% |
| 4 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| 3 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 332 | |
| 1 | 60 | 13.5% |
| 2 | 46 | 10.3% |
| 3 | 1 | 0.2% |
| 4 | 3 | 0.7% |
| 5 | 3 | 0.7% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 341 | |
| 1 | 65 | 14.6% |
| 2 | 35 | 7.8% |
| 3 | 1 | 0.2% |
| 4 | 2 | 0.4% |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 341 | |
| 1 | 65 | 14.6% |
| 2 | 35 | 7.8% |
| 3 | 1 | 0.2% |
| 4 | 2 | 0.4% |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 332 | |
| 1 | 60 | 13.5% |
| 2 | 46 | 10.3% |
| 3 | 1 | 0.2% |
| 4 | 3 | 0.7% |
| 5 | 3 | 0.7% |
| 6 | 1 | 0.2% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 369 | 385 |
| Distinct (%) | 82.7% | 86.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.6681614 | 6.6995516 |
| Min length | 3 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 311 | 337 ? |
| Unique (%) | 69.7% | 75.6% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 347063 | C 7076 |
| 2nd row | 239853 | 370373 |
| 3rd row | 211536 | C.A. 31026 |
| 4th row | 248738 | PC 17597 |
| 5th row | CA. 2343 | 11668 |
| Value | Count | Frequency (%) |
| pc | 27 | 4.8% |
| c.a | 12 | 2.1% |
| a/5 | 9 | 1.6% |
| ca | 9 | 1.6% |
| soton/oq | 6 | 1.1% |
| a/4 | 5 | 0.9% |
| w./c | 5 | 0.9% |
| ston/o2 | 4 | 0.7% |
| ston/o | 4 | 0.7% |
| 2 | 4 | 0.7% |
| Other values (389) | 476 |
| Value | Count | Frequency (%) |
| pc | 29 | 5.2% |
| c.a | 12 | 2.2% |
| a/5 | 8 | 1.4% |
| ca | 8 | 1.4% |
| f.c.c | 5 | 0.9% |
| 2 | 5 | 0.9% |
| ston/o | 5 | 0.9% |
| sc/paris | 5 | 0.9% |
| w./c | 4 | 0.7% |
| 1601 | 4 | 0.7% |
| Other values (403) | 473 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 390 | |
| 1 | 335 | |
| 2 | 293 | |
| 7 | 234 | 7.9% |
| 4 | 232 | 7.8% |
| 6 | 206 | 6.9% |
| 5 | 202 | 6.8% |
| 0 | 198 | 6.7% |
| 9 | 160 | 5.4% |
| 8 | 139 | 4.7% |
| Other values (25) | 585 |
| Value | Count | Frequency (%) |
| 3 | 394 | |
| 1 | 361 | |
| 2 | 281 | |
| 4 | 243 | |
| 7 | 240 | |
| 6 | 207 | 6.9% |
| 5 | 196 | 6.6% |
| 0 | 196 | 6.6% |
| 9 | 165 | 5.5% |
| 8 | 141 | 4.7% |
| Other values (22) | 564 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2974 |
| Value | Count | Frequency (%) |
| (unknown) | 2988 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 390 | |
| 1 | 335 | |
| 2 | 293 | |
| 7 | 234 | 7.9% |
| 4 | 232 | 7.8% |
| 6 | 206 | 6.9% |
| 5 | 202 | 6.8% |
| 0 | 198 | 6.7% |
| 9 | 160 | 5.4% |
| 8 | 139 | 4.7% |
| Other values (25) | 585 |
| Value | Count | Frequency (%) |
| 3 | 394 | |
| 1 | 361 | |
| 2 | 281 | |
| 4 | 243 | |
| 7 | 240 | |
| 6 | 207 | 6.9% |
| 5 | 196 | 6.6% |
| 0 | 196 | 6.6% |
| 9 | 165 | 5.5% |
| 8 | 141 | 4.7% |
| Other values (22) | 564 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2974 |
| Value | Count | Frequency (%) |
| (unknown) | 2988 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 390 | |
| 1 | 335 | |
| 2 | 293 | |
| 7 | 234 | 7.9% |
| 4 | 232 | 7.8% |
| 6 | 206 | 6.9% |
| 5 | 202 | 6.8% |
| 0 | 198 | 6.7% |
| 9 | 160 | 5.4% |
| 8 | 139 | 4.7% |
| Other values (25) | 585 |
| Value | Count | Frequency (%) |
| 3 | 394 | |
| 1 | 361 | |
| 2 | 281 | |
| 4 | 243 | |
| 7 | 240 | |
| 6 | 207 | 6.9% |
| 5 | 196 | 6.6% |
| 0 | 196 | 6.6% |
| 9 | 165 | 5.5% |
| 8 | 141 | 4.7% |
| Other values (22) | 564 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2974 |
| Value | Count | Frequency (%) |
| (unknown) | 2988 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 390 | |
| 1 | 335 | |
| 2 | 293 | |
| 7 | 234 | 7.9% |
| 4 | 232 | 7.8% |
| 6 | 206 | 6.9% |
| 5 | 202 | 6.8% |
| 0 | 198 | 6.7% |
| 9 | 160 | 5.4% |
| 8 | 139 | 4.7% |
| Other values (25) | 585 |
| Value | Count | Frequency (%) |
| 3 | 394 | |
| 1 | 361 | |
| 2 | 281 | |
| 4 | 243 | |
| 7 | 240 | |
| 6 | 207 | 6.9% |
| 5 | 196 | 6.6% |
| 0 | 196 | 6.6% |
| 9 | 165 | 5.5% |
| 8 | 141 | 4.7% |
| Other values (22) | 564 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 182 | 184 |
| Distinct (%) | 40.8% | 41.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 31.591143 | 32.636071 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 512.3292 |
| Zeros | 9 | 7 |
| Zeros (%) | 2.0% | 1.6% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.225 | 7.225 |
| Q1 | 7.9031 | 7.8958 |
| median | 15.025 | 14.4271 |
| Q3 | 31 | 32.087475 |
| 95-th percentile | 112.67708 | 108.9 |
| Maximum | 512.3292 | 512.3292 |
| Range | 512.3292 | 512.3292 |
| Interquartile range (IQR) | 23.0969 | 24.191675 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 46.270476 | 51.97986 |
| Coefficient of variation (CV) | 1.4646661 | 1.592712 |
| Kurtosis | 32.041157 | 35.72501 |
| Mean | 31.591143 | 32.636071 |
| Median Absolute Deviation (MAD) | 7.5021 | 6.7375 |
| Skewness | 4.5589325 | 5.0330051 |
| Sum | 14089.65 | 14555.688 |
| Variance | 2140.957 | 2701.9058 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 8.05 | 20 | 4.5% |
| 13 | 19 | 4.3% |
| 7.8958 | 19 | 4.3% |
| 26 | 18 | 4.0% |
| 7.75 | 15 | 3.4% |
| 7.25 | 10 | 2.2% |
| 10.5 | 9 | 2.0% |
| 0 | 9 | 2.0% |
| 7.775 | 8 | 1.8% |
| 7.2292 | 8 | 1.8% |
| Other values (172) | 311 |
| Value | Count | Frequency (%) |
| 8.05 | 24 | 5.4% |
| 7.8958 | 19 | 4.3% |
| 7.75 | 18 | 4.0% |
| 13 | 18 | 4.0% |
| 26 | 16 | 3.6% |
| 10.5 | 12 | 2.7% |
| 26.55 | 10 | 2.2% |
| 7.925 | 9 | 2.0% |
| 7.775 | 8 | 1.8% |
| 7.2292 | 8 | 1.8% |
| Other values (174) | 304 |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 5 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 7.05 | 2 | 0.4% |
| 7.125 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 4.0125 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 2 | 0.4% |
| 7.0542 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 4.0125 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 2 | 0.4% |
| 7.0542 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 5 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 7.05 | 2 | 0.4% |
| 7.125 | 2 | 0.4% |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 80 | 91 |
| Distinct (%) | 82.5% | 83.5% |
| Missing | 349 | 337 |
| Missing (%) | 78.3% | 75.6% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.7010309 | 3.5321101 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 66 | 74 ? |
| Unique (%) | 68.0% | 67.9% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | C65 | D49 |
| 2nd row | B4 | F4 |
| 3rd row | D | E17 |
| 4th row | A32 | E8 |
| 5th row | C104 | F E69 |
| Value | Count | Frequency (%) |
| b96 | 4 | 3.4% |
| b98 | 4 | 3.4% |
| f2 | 3 | 2.6% |
| f | 3 | 2.6% |
| c65 | 2 | 1.7% |
| b28 | 2 | 1.7% |
| d | 2 | 1.7% |
| b77 | 2 | 1.7% |
| g73 | 2 | 1.7% |
| b51 | 2 | 1.7% |
| Other values (81) | 91 |
| Value | Count | Frequency (%) |
| g6 | 3 | 2.4% |
| f | 3 | 2.4% |
| f33 | 2 | 1.6% |
| c22 | 2 | 1.6% |
| c26 | 2 | 1.6% |
| d33 | 2 | 1.6% |
| c124 | 2 | 1.6% |
| b18 | 2 | 1.6% |
| c23 | 2 | 1.6% |
| f2 | 2 | 1.6% |
| Other values (92) | 103 |
Most occurring characters
| Value | Count | Frequency (%) |
| B | 39 | 10.9% |
| 2 | 33 | 9.2% |
| 5 | 27 | 7.5% |
| C | 27 | 7.5% |
| 3 | 26 | 7.2% |
| 1 | 26 | 7.2% |
| 6 | 24 | 6.7% |
| 9 | 22 | 6.1% |
| 20 | 5.6% | |
| 8 | 17 | 4.7% |
| Other values (8) | 98 |
| Value | Count | Frequency (%) |
| C | 43 | |
| 2 | 41 | 10.6% |
| 1 | 31 | 8.1% |
| 3 | 30 | 7.8% |
| 6 | 29 | 7.5% |
| B | 24 | 6.2% |
| 8 | 22 | 5.7% |
| 4 | 20 | 5.2% |
| D | 19 | 4.9% |
| 7 | 19 | 4.9% |
| Other values (9) | 107 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 359 |
| Value | Count | Frequency (%) |
| (unknown) | 385 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| B | 39 | 10.9% |
| 2 | 33 | 9.2% |
| 5 | 27 | 7.5% |
| C | 27 | 7.5% |
| 3 | 26 | 7.2% |
| 1 | 26 | 7.2% |
| 6 | 24 | 6.7% |
| 9 | 22 | 6.1% |
| 20 | 5.6% | |
| 8 | 17 | 4.7% |
| Other values (8) | 98 |
| Value | Count | Frequency (%) |
| C | 43 | |
| 2 | 41 | 10.6% |
| 1 | 31 | 8.1% |
| 3 | 30 | 7.8% |
| 6 | 29 | 7.5% |
| B | 24 | 6.2% |
| 8 | 22 | 5.7% |
| 4 | 20 | 5.2% |
| D | 19 | 4.9% |
| 7 | 19 | 4.9% |
| Other values (9) | 107 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 359 |
| Value | Count | Frequency (%) |
| (unknown) | 385 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| B | 39 | 10.9% |
| 2 | 33 | 9.2% |
| 5 | 27 | 7.5% |
| C | 27 | 7.5% |
| 3 | 26 | 7.2% |
| 1 | 26 | 7.2% |
| 6 | 24 | 6.7% |
| 9 | 22 | 6.1% |
| 20 | 5.6% | |
| 8 | 17 | 4.7% |
| Other values (8) | 98 |
| Value | Count | Frequency (%) |
| C | 43 | |
| 2 | 41 | 10.6% |
| 1 | 31 | 8.1% |
| 3 | 30 | 7.8% |
| 6 | 29 | 7.5% |
| B | 24 | 6.2% |
| 8 | 22 | 5.7% |
| 4 | 20 | 5.2% |
| D | 19 | 4.9% |
| 7 | 19 | 4.9% |
| Other values (9) | 107 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 359 |
| Value | Count | Frequency (%) |
| (unknown) | 385 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| B | 39 | 10.9% |
| 2 | 33 | 9.2% |
| 5 | 27 | 7.5% |
| C | 27 | 7.5% |
| 3 | 26 | 7.2% |
| 1 | 26 | 7.2% |
| 6 | 24 | 6.7% |
| 9 | 22 | 6.1% |
| 20 | 5.6% | |
| 8 | 17 | 4.7% |
| Other values (8) | 98 |
| Value | Count | Frequency (%) |
| C | 43 | |
| 2 | 41 | 10.6% |
| 1 | 31 | 8.1% |
| 3 | 30 | 7.8% |
| 6 | 29 | 7.5% |
| B | 24 | 6.2% |
| 8 | 22 | 5.7% |
| 4 | 20 | 5.2% |
| D | 19 | 4.9% |
| 7 | 19 | 4.9% |
| Other values (9) | 107 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 2 | 2 |
| Missing (%) | 0.4% | 0.4% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | S |
| 2nd row | S | Q |
| 3rd row | S | S |
| 4th row | S | C |
| 5th row | S | S |
Common Values
| Value | Count | Frequency (%) |
| S | 319 | |
| C | 84 | 18.8% |
| Q | 41 | 9.2% |
| (Missing) | 2 | 0.4% |
| Value | Count | Frequency (%) |
| S | 323 | |
| C | 84 | 18.8% |
| Q | 37 | 8.3% |
| (Missing) | 2 | 0.4% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 319 | |
| c | 84 | 18.9% |
| q | 41 | 9.2% |
| Value | Count | Frequency (%) |
| s | 323 | |
| c | 84 | 18.9% |
| q | 37 | 8.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 319 | |
| C | 84 | 18.9% |
| Q | 41 | 9.2% |
| Value | Count | Frequency (%) |
| S | 323 | |
| C | 84 | 18.9% |
| Q | 37 | 8.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 444 |
| Value | Count | Frequency (%) |
| (unknown) | 444 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| S | 319 | |
| C | 84 | 18.9% |
| Q | 41 | 9.2% |
| Value | Count | Frequency (%) |
| S | 323 | |
| C | 84 | 18.9% |
| Q | 37 | 8.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 444 |
| Value | Count | Frequency (%) |
| (unknown) | 444 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| S | 319 | |
| C | 84 | 18.9% |
| Q | 41 | 9.2% |
| Value | Count | Frequency (%) |
| S | 323 | |
| C | 84 | 18.9% |
| Q | 37 | 8.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 444 |
| Value | Count | Frequency (%) |
| (unknown) | 444 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| S | 319 | |
| C | 84 | 18.9% |
| Q | 41 | 9.2% |
| Value | Count | Frequency (%) |
| S | 323 | |
| C | 84 | 18.9% |
| Q | 37 | 8.3% |
Interactions
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Correlations
Dataset A
Dataset B
Dataset A
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.000 | 0.135 | -0.263 | 0.068 | 0.272 | 0.177 | -0.213 | 0.279 |
| Embarked | 0.000 | 1.000 | 0.187 | 0.028 | 0.000 | 0.260 | 0.086 | 0.082 | 0.188 |
| Fare | 0.135 | 0.187 | 1.000 | 0.419 | -0.022 | 0.486 | 0.157 | 0.448 | 0.251 |
| Parch | -0.263 | 0.028 | 0.419 | 1.000 | -0.028 | 0.000 | 0.251 | 0.471 | 0.140 |
| PassengerId | 0.068 | 0.000 | -0.022 | -0.028 | 1.000 | 0.000 | 0.141 | -0.069 | 0.160 |
| Pclass | 0.272 | 0.260 | 0.486 | 0.000 | 0.000 | 1.000 | 0.120 | 0.099 | 0.380 |
| Sex | 0.177 | 0.086 | 0.157 | 0.251 | 0.141 | 0.120 | 1.000 | 0.185 | 0.549 |
| SibSp | -0.213 | 0.082 | 0.448 | 0.471 | -0.069 | 0.099 | 0.185 | 1.000 | 0.214 |
| Survived | 0.279 | 0.188 | 0.251 | 0.140 | 0.160 | 0.380 | 0.549 | 0.214 | 1.000 |
Dataset B
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.000 | 0.128 | -0.279 | 0.028 | 0.272 | 0.000 | -0.163 | 0.043 |
| Embarked | 0.000 | 1.000 | 0.213 | 0.000 | 0.000 | 0.253 | 0.044 | 0.052 | 0.118 |
| Fare | 0.128 | 0.213 | 1.000 | 0.397 | -0.014 | 0.493 | 0.092 | 0.431 | 0.243 |
| Parch | -0.279 | 0.000 | 0.397 | 1.000 | -0.065 | 0.000 | 0.216 | 0.385 | 0.088 |
| PassengerId | 0.028 | 0.000 | -0.014 | -0.065 | 1.000 | 0.000 | 0.000 | -0.065 | 0.000 |
| Pclass | 0.272 | 0.253 | 0.493 | 0.000 | 0.000 | 1.000 | 0.134 | 0.122 | 0.318 |
| Sex | 0.000 | 0.044 | 0.092 | 0.216 | 0.000 | 0.134 | 1.000 | 0.173 | 0.498 |
| SibSp | -0.163 | 0.052 | 0.431 | 0.385 | -0.065 | 0.122 | 0.173 | 1.000 | 0.100 |
| Survived | 0.043 | 0.118 | 0.243 | 0.088 | 0.000 | 0.318 | 0.498 | 0.100 | 1.000 |
Missing values
Dataset A
A simple visualization of nullity by column.
Dataset B
A simple visualization of nullity by column.
Dataset A
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset B
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset A
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Dataset B
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Sample
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 805 | 806 | 0 | 3 | Johansson, Mr. Karl Johan | male | 31.00 | 0 | 0 | 347063 | 7.7750 | NaN | S |
| 413 | 414 | 0 | 2 | Cunningham, Mr. Alfred Fleming | male | NaN | 0 | 0 | 239853 | 0.0000 | NaN | S |
| 886 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.00 | 0 | 0 | 211536 | 13.0000 | NaN | S |
| 78 | 79 | 1 | 2 | Caldwell, Master. Alden Gates | male | 0.83 | 0 | 2 | 248738 | 29.0000 | NaN | S |
| 792 | 793 | 0 | 3 | Sage, Miss. Stella Anna | female | NaN | 8 | 2 | CA. 2343 | 69.5500 | NaN | S |
| 307 | 308 | 1 | 1 | Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo) | female | 17.00 | 1 | 0 | PC 17758 | 108.9000 | C65 | C |
| 259 | 260 | 1 | 2 | Parrish, Mrs. (Lutie Davis) | female | 50.00 | 0 | 1 | 230433 | 26.0000 | NaN | S |
| 201 | 202 | 0 | 3 | Sage, Mr. Frederick | male | NaN | 8 | 2 | CA. 2343 | 69.5500 | NaN | S |
| 524 | 525 | 0 | 3 | Kassem, Mr. Fared | male | NaN | 0 | 0 | 2700 | 7.2292 | NaN | C |
| 56 | 57 | 1 | 2 | Rugg, Miss. Emily | female | 21.00 | 0 | 0 | C.A. 31026 | 10.5000 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 365 | 366 | 0 | 3 | Adahl, Mr. Mauritz Nils Martin | male | 30.0 | 0 | 0 | C 7076 | 7.2500 | NaN | S |
| 289 | 290 | 1 | 3 | Connolly, Miss. Kate | female | 22.0 | 0 | 0 | 370373 | 7.7500 | NaN | Q |
| 56 | 57 | 1 | 2 | Rugg, Miss. Emily | female | 21.0 | 0 | 0 | C.A. 31026 | 10.5000 | NaN | S |
| 155 | 156 | 0 | 1 | Williams, Mr. Charles Duane | male | 51.0 | 0 | 1 | PC 17597 | 61.3792 | NaN | C |
| 41 | 42 | 0 | 2 | Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott) | female | 27.0 | 1 | 0 | 11668 | 21.0000 | NaN | S |
| 350 | 351 | 0 | 3 | Odahl, Mr. Nils Martin | male | 23.0 | 0 | 0 | 7267 | 9.2250 | NaN | S |
| 681 | 682 | 1 | 1 | Hassab, Mr. Hammad | male | 27.0 | 0 | 0 | PC 17572 | 76.7292 | D49 | C |
| 874 | 875 | 1 | 2 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | 28.0 | 1 | 0 | P/PP 3381 | 24.0000 | NaN | C |
| 794 | 795 | 0 | 3 | Dantcheff, Mr. Ristiu | male | 25.0 | 0 | 0 | 349203 | 7.8958 | NaN | S |
| 368 | 369 | 1 | 3 | Jermyn, Miss. Annie | female | NaN | 0 | 0 | 14313 | 7.7500 | NaN | Q |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 647 | 648 | 1 | 1 | Simonius-Blumer, Col. Oberst Alfons | male | 56.0 | 0 | 0 | 13213 | 35.5000 | A26 | C |
| 588 | 589 | 0 | 3 | Gilinski, Mr. Eliezer | male | 22.0 | 0 | 0 | 14973 | 8.0500 | NaN | S |
| 672 | 673 | 0 | 2 | Mitchell, Mr. Henry Michael | male | 70.0 | 0 | 0 | C.A. 24580 | 10.5000 | NaN | S |
| 430 | 431 | 1 | 1 | Bjornstrom-Steffansson, Mr. Mauritz Hakan | male | 28.0 | 0 | 0 | 110564 | 26.5500 | C52 | S |
| 157 | 158 | 0 | 3 | Corn, Mr. Harry | male | 30.0 | 0 | 0 | SOTON/OQ 392090 | 8.0500 | NaN | S |
| 447 | 448 | 1 | 1 | Seward, Mr. Frederic Kimber | male | 34.0 | 0 | 0 | 113794 | 26.5500 | NaN | S |
| 231 | 232 | 0 | 3 | Larsson, Mr. Bengt Edvin | male | 29.0 | 0 | 0 | 347067 | 7.7750 | NaN | S |
| 486 | 487 | 1 | 1 | Hoyt, Mrs. Frederick Maxfield (Jane Anne Forby) | female | 35.0 | 1 | 0 | 19943 | 90.0000 | C93 | S |
| 186 | 187 | 1 | 3 | O'Brien, Mrs. Thomas (Johanna "Hannah" Godfrey) | female | NaN | 1 | 0 | 370365 | 15.5000 | NaN | Q |
| 685 | 686 | 0 | 2 | Laroche, Mr. Joseph Philippe Lemercier | male | 25.0 | 1 | 2 | SC/Paris 2123 | 41.5792 | NaN | C |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 412 | 413 | 1 | 1 | Minahan, Miss. Daisy E | female | 33.0 | 1 | 0 | 19928 | 90.0000 | C78 | Q |
| 711 | 712 | 0 | 1 | Klaber, Mr. Herman | male | NaN | 0 | 0 | 113028 | 26.5500 | C124 | S |
| 309 | 310 | 1 | 1 | Francatelli, Miss. Laura Mabel | female | 30.0 | 0 | 0 | PC 17485 | 56.9292 | E36 | C |
| 430 | 431 | 1 | 1 | Bjornstrom-Steffansson, Mr. Mauritz Hakan | male | 28.0 | 0 | 0 | 110564 | 26.5500 | C52 | S |
| 496 | 497 | 1 | 1 | Eustis, Miss. Elizabeth Mussey | female | 54.0 | 1 | 0 | 36947 | 78.2667 | D20 | C |
| 147 | 148 | 0 | 3 | Ford, Miss. Robina Maggie "Ruby" | female | 9.0 | 2 | 2 | W./C. 6608 | 34.3750 | NaN | S |
| 867 | 868 | 0 | 1 | Roebling, Mr. Washington Augustus II | male | 31.0 | 0 | 0 | PC 17590 | 50.4958 | A24 | S |
| 641 | 642 | 1 | 1 | Sagesser, Mlle. Emma | female | 24.0 | 0 | 0 | PC 17477 | 69.3000 | B35 | C |
| 225 | 226 | 0 | 3 | Berglund, Mr. Karl Ivar Sven | male | 22.0 | 0 | 0 | PP 4348 | 9.3500 | NaN | S |
| 54 | 55 | 0 | 1 | Ostby, Mr. Engelhart Cornelius | male | 65.0 | 0 | 1 | 113509 | 61.9792 | B30 | C |
Duplicate rows
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||